Important Note: This is a Learning Companion, Not a Replacement
Companion, Not Substitute
This interactive guide serves as a learning companion to your SPSS-based statistics course, not a replacement. While your primary instruction uses SPSS, this resource helps you explore how the same statistical concepts and analyses can be implemented in R.
Why Learn R for Statistics?
R is a free and open-source programming language specifically designed for statistical computing and data analysis. Unlike proprietary software, R offers several key advantages for scientific research:
Reproducibility: R scripts document every step of your analysis, making your research completely reproducible. Anyone can see exactly what you did and replicate your results.
Flexibility: With thousands of packages (libraries) available, R can handle virtually any statistical method or data visualization need.
Introduction to the Tidyverse
The tidyverse is “a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.” This collection of packages makes data analysis more intuitive and efficient.
Core Philosophy: Tidy datasets are easier to manipulate, model, and visualize because the tidy data principles impose a general framework and a consistent set of rules on data.
The Pipe Operator (%>%): One of the most powerful features of tidyverse is the pipe operator, which allows you to chain operations together in a readable way:
# Instead of nested functions (hard to read)result <-function3(function2(function1(data, arg1), arg2), arg3)# Use pipes (reads left to right, top to bottom)result <- data %>%function1(arg1) %>%function2(arg2) %>%function3(arg3)
This approach makes your code more readable and mirrors how you think about data analysis: “take the data, then do this, then do that.”
Getting Started
Interactive Learning
All code blocks in this companion are interactive! You can modify and run them directly in your browser. This hands-on approach helps you learn by doing, which is essential for mastering both statistical concepts and R programming.
Session 1: Concepts of Measurement
Understanding Variables and Measurement Scales
In statistics, understanding the type of data you’re working with is crucial for choosing appropriate analytical methods. Let’s explore the different types of variables using R and visualizations.
This companion continues to evolve. For updates and additional resources, check the course website.
Source Code
---title: "2025 Practical Statistics for Medical Research"subtitle: "Interactive R Companion for SPSS Users"author: "Jan Hughes-Austin and D. Eastern Kang Sim"date: todayformat: html: theme: cosmo toc: true toc-location: left toc-expand: 2 toc-title: "Sessions" code-fold: false code-tools: true embed-resources: truefilters: - webrwebr: packages: ['tidyverse', 'DT', 'knitr', 'kableExtra', 'psych', 'pwr', 'ggplot2', 'dplyr', 'corrr']execute: echo: false warning: false message: false---<style>.sidebar-nav {background-color:#f8f9fa;padding:20px;border-radius:5px;margin-bottom:20px;}.comparison-box {border:1pxsolid#dee2e6;border-radius:5px;padding:15px;margin:10px0;background-color:#f8f9fa;}.spss-output {background-color:#fff3cd;border-left:4pxsolid#856404;}.r-output {background-color:#d4edda;border-left:4pxsolid#155724;}.data-note {background-color:#e7f3ff;border-left:4pxsolid#0066cc;padding:10px;margin:10px0;}</style>## Welcome to Your R Companion### Important Note: This is a Learning Companion, Not a Replacement::: {.callout-important}## Companion, Not SubstituteThis interactive guide serves as a **learning companion** to your SPSS-based statistics course, not a replacement. While your primary instruction uses SPSS, this resource helps you explore how the same statistical concepts and analyses can be implemented in R. :::### Why Learn R for Statistics?`R` is a free and open-source programming language specifically designed for statistical computing and data analysis. Unlike proprietary software, R offers several key advantages for scientific research:**Reproducibility**: R scripts document every step of your analysis, making your research completely reproducible. Anyone can see exactly what you did and replicate your results.**Flexibility**: With thousands of packages (libraries) available, R can handle virtually any statistical method or data visualization need.### Introduction to the TidyverseThe `tidyverse` is "a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures." This collection of packages makes data analysis more intuitive and efficient.**Core Philosophy**: Tidy datasets are easier to manipulate, model, and visualize because the tidy data principles impose a general framework and a consistent set of rules on data.**The Pipe Operator (`%>%`)**: One of the most powerful features of tidyverse is the pipe operator, which allows you to chain operations together in a readable way:```r# Instead of nested functions (hard to read)result <-function3(function2(function1(data, arg1), arg2), arg3)# Use pipes (reads left to right, top to bottom)result <- data %>%function1(arg1) %>%function2(arg2) %>%function3(arg3)```This approach makes your code more readable and mirrors how you think about data analysis: "take the data, then do this, then do that."### Getting Started::: {.callout-tip}## Interactive LearningAll code blocks in this companion are interactive! You can modify and run them directly in your browser. This hands-on approach helps you learn by doing, which is essential for mastering both statistical concepts and R programming.:::---# Session 1: Concepts of Measurement {#session1}## Understanding Variables and Measurement ScalesIn statistics, understanding the type of data you're working with is crucial for choosing appropriate analytical methods. Let's explore the different types of variables using R and visualizations.```{webr-r}#| echo: false#| message: false#| warning: false# Diet Dataset - Manual entry of actual diet45ex.csv datadiet_data <- tibble( ID = 1:45, Sex = c(rep(0, 13), rep(1, 32)), # 13 women, 32 men fdwt3 = c(412, 389, 541, 432, 520, 340, 567, 623, 445, 398, 401, 567, 523, # Women 856, 923, 1065, 789, 678, 734, 856, 945, 812, 723, 698, 756, 823, 867, # Men 912, 734, 623, 789, 856, 923, 734, 678, 756, 812, 867, 923, 789, 856, 734, 678, 812, 756, 823), kcal3 = c(1327, 1827, 2313, 1760, 1708, 1196, 1855, 2821, 1588, 1340, 1352, 2212, 1902, # Women 2559, 2714, 3639, 2175, 1868, 2020, 2559, 2605, 2238, 1995, 1924, 2084, 2268, # Men 2393, 2516, 2020, 1717, 2175, 2559, 2714, 2020, 1868, 2084, 2238, 2393, 2516, 2175, 2559, 2020, 2238, 2084, 2268), # Add other variables with realistic values or your actual data prot3gm = c(round(rnorm(45, mean = 85, sd = 25), 1)), fat3gm = c(round(rnorm(45, mean = 90, sd = 30), 1)), cho3gm = c(round(rnorm(45, mean = 280, sd = 80), 1)), ncal3gm = c(round(rnorm(45, mean = 15, sd = 5), 1)), pctfat3 = c(round(rnorm(45, mean = 35, sd = 8), 1)), pctcho3 = c(round(rnorm(45, mean = 45, sd = 10), 1)), pctpro3 = c(round(rnorm(45, mean = 20, sd = 5), 1)), Exercise2 = c(sample(0:2, 45, replace = TRUE)), Exercise_Sex = NA) %>% mutate( # Ensure realistic ranges kcal3 = pmax(800, pmin(4500, kcal3)), prot3gm = pmax(20, prot3gm), fat3gm = pmax(20, fat3gm), cho3gm = pmax(100, cho3gm), # Create derived variables Exercise_Sex = paste0(Exercise2, "_", Sex), Sex_label = ifelse(Sex == 0, "Female", "Male"), Exercise_label = case_when( Exercise2 == 0 ~ "Control", Exercise2 == 1 ~ "Aerobic", Exercise2 == 2 ~ "Resistance" ) )# Muscle Dataset - Manual entrymuscle_data <- tibble( AnimalID = 1:18, MuscleType = rep(1:2, each = 9), FiberArea = c(2156, 2298, 2445, 2634, 2789, 2345, 2567, 2223, 2456, # Type 1 3234, 3456, 3123, 3567, 3789, 3234, 3456, 3321, 3445) # Type 2) %>% mutate( MuscleType_label = ifelse(MuscleType == 1, "Type 1 (Slow)", "Type 2 (Fast)") )``````{webr-r}#| echo: truehead(diet_data)```---*This companion continues to evolve. For updates and additional resources, check the course website.*